DRAFT 



Shannon Theoretic Limits on Noisy 
Compressive Samphng 

Mehmet Ak^akaya and Vahid Tarokh 



o 

o . 

CN . Abstract 
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^ ' In this paper, we study the number of measurements required to recover a sparse signal in C*^ with 



(N 



X 



L non-zero coefficients from compressed samples in the presence of noise. For a number of different 
recovery criteria, we prove that 0{L) (an asymptotically linear multiple of L) measurements are necessary 
and sufficient if L grows linearly as a function of M. This improves on the existing literature that is 



^ ' mostly focused on variants of a specific recovery algorithm based on convex programming, for which 

^ ' 0{L\og{M — L)) measurements are required. We also show that 0{L\og{M — L)) measurements are 

required in the sublinear regime (L = o(Af )). 

> 
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O ■ I. Introduction 



Let C denote the complex field and C^^ the M-dimensional complex space. For any x G C^^, let 



^1 ||x||o denote the number of non-zero coefficients of x. Whenever ||x||o = L « M, it is advantageous 

to measure a linear combination of the components of x as 

y = Ax, 

where A is an x M measurement matrix. 

A decoder can then recover x from the observed vector by solving the Cq minimization problem 

min||x||o s. t. y = Ax. 

M. Akfakaya and V. Tarokh are with the School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 
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This data acquisition technique for sparse signals is called compressive sampling [4], [5]. However, the 
optimization problem for recovery is NP-hard to solve [8]. In this light, alternative solution methods have 
been studied in the literature. One such approach is the L\ regularization approach, where one solves 

min||x||i s. t. y = Ax, 

and then estabUshes criteria under which the solution to this problem is also that of the £o minimization 
problem. By considering certain classes of Gaussian and partial Fourier ensembles, Candes and Tao 
showed in [4] that this recovery problem could be solved for L = 0{M) with N = 0{L) as long as 
the observations are noiseless. Another strand of work considers solving the jCo recovery problem for a 
specific class of measurement matrices, such as the Vandermonde frames [1]. 
In practice, however, all the measurements are noisy, i.e. 

y = Ax + n (I) 

for some additive noise n G C^. This motivates our work, where we study Shaimon theoretic limits on 
the recovery of sparse signals in the presence of noise. More specifically, we are interested in the order 
of the number of measurements required, N in terms of L, M. We consider the linear sparsity regime 
M = (3L for /? > 2. It was shown in [1] that /3 > 2 is required even in the noiseless setting for the 
unique recovery of the signal. 

Wainwright considered this problem with n being Gaussian noise in [10], and derived information 
theoretic limits on the noisy problem for a specific performance metric and a decoder that decodes to the 
closest subspace, showing that for the linear sparsity regime, the number of measurements required is also 
0{L). In [II], Wainwright studied the jCi constrained quadratic programming algorithm (LASSO) in the 
noisy setting and showed that in this case the number of measurements required is N = 0{L log(M— L)). 
Therefore there is a gap between what is achievable theoretically with an information theoretic decoder 
and what is achievable with a practical decoder based on £i regularization. The total power of the signal, 

\Ml = P 

grows unboundedly as a function of N according to the analysis in [10]. The reason for this requirement 
is that at high dimensions, the performance metric in consideration is too stringent for an average case 
analysis. 

In this note, we consider various performance metrics, some of which are of more Shannon theoretic 
spirit. We use a decoder based on joint typicality. Although such a decoder may not be computationally 
feasible in practice, it enables us to characterize the performance limits on the sparse representation 
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problem. Using this decoder, we first derive a result similar to that of [10] for the same performance 
metric. For the other performance metrics that are more statistical in nature, we derive results stating that 
the number of required measurements is 0{L) and that P does not have to grow with N. 

The outline of this paper is given next. In Section |lll we define the problem to be considered in this 
paper, establish the notation and performance metrics, and state our main results and their implications. 
Section |lll] and Section JV] provide the proofs for the theorems stated in Section JI] In Section |Vl we 
state analogous theorems for the sublinear sparsity regime, L = o{M). 

II. Main Results 

We consider the compressive sampling of an unknown vector, x G C^^. Let x have support I = 
supp(x), where 

SUpp(x) = {i\ Xiy^O} 

with ||x||o = \I\ = L = [^M\ , where /3 > 2. We also define 

yLi(x) = min |xi|. (2) 

We consider the noisy model given in Equation ([T]), where n is an additive noise vector with a 
complex circularly-symmetric Gaussian distribution with zero mean and covariance matrix v'^In, i e. 
n ~ Mc{0, v^In)- Due to the presence of noise, x cannot be recovered exactly. However, a sparse 
recovery algorithm outputs an estimate x with ||x||o = L. We consider three performance metrics for the 
estimate: 

Error Metric 1: pi(x, x) = l(^Xi 7^ G J} n {xj = Vj ^ X}^ (3) 

Error Metric 2: p2(x, x) = x,./0}nJ| 

Error Metric 3: P3(x,x) = if ^ > (1 - 7)P ] (5) 

V feG{i|£,^0}nJ / 

where I(-) is the indicator function and 0,7 G (0, 1). 

Error Metric 1 is referred to as the 0-1 loss metric, and it is the one considered by Wainwright [10]. 
Error Metric 2 is a statistical extension of Error Metric 1, and considers the recovery of most of the 
subspace information of x. Error Metric 3 is directly from Shannon Theory and characterizes the recovery 
of most of the energy of x. 

Consider a sequence of vectors, {x(*^)}m such that x(*^) G C*^ with = supp(x(^)), where 

|X{*^)| = = L^MJ. For x^*^), we will consider an ensemble of x Af Gaussian measurement 
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matrices, A^*^), where is a function of M. Since the dependence of L(*^\x(^^) and A^^^) on M is 
implied by the vector x*^*^\ we will omit the superscript for brevity, and denote the support of x^^^^ by 
X, its size by L and any measurement matrix from the ensemble by A, whenever there is no ambiguity. 

A decoder, D(-) will output a set of indices, Viy). For a specific decoder, we consider the average 
probability of error, averaged over all Gaussian measurement matrices, A with the (i, j)* term a^j ~ 
AAc(0,l): 

Perr(2?|x(*^)) = Ea (perr(A|x(^'^))) , (6) 

where perr(A|x(*^)) = P(P(y) / 1) for y = Ax^*^) + n and P(-) is the probabiUty measure. 

We say a decoder achieves asymptotic reliable sparse recovery if ps„{'D\-x.^^'^^) ^ as M ^ co. 
Similarly we say asymptotic reliable sparse recovery is not possible if perr(^|x'^^"^'') stays bounded away 
from as M ^ oo. 

We also use the notation 

f{x) >- g{x) 

for either f{x) = g{x) = or for non-decreasing non-negative functions f{x) and g{x), if 3 xq such 
that for all x > xq, 

a{x) 

Similarly we say f{x) -< g{x) if g{x) y f{x). 

Theorem 2.1: (Achievability for Error Metric 1) Let a sequence of sparse vectors, {x^*^) € C^^}m 
with ||x(^^)||o = L = [^MJ, where /3 > 2 be given. Then asymptotic reliable recovery is possible for 
|x(*^) } with respect to Error Metric 1 if "^^ipg^ ^ oo as L ^ oo and 

NyCiL (7) 

for some constant Ci > 1 that depends only on /3, /z(x(^^)) and v. 

Proof: The proof is given in Section IIII-C.ll □ 
Corollary 2.2: Let the conditions of Theorem 12.11 be satisfied. Then for any Gaussian measurement 
matrix. A, and for Error Metric 1, — log P(perr(A|x(''^)) > ^)/logL ^ oo as L ^ oo for any ^ G (0, 1]. 

Proof: Markov's Inequality implies 

Pte,(A|x<«)) > < = "-'^1-'"'). 

As shown in the proof of Theorem 12. 1[ — Iogpen(^|x'^*^^)/ log L ^ oo as L ^ oo, yielding the 
desired result. □ 
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Theorem 2.3: (Converse for Error Metric 1) Let a sequence of sparse vectors, {x^^^^ G C*^}Af with 
||x^^^^||o = L = [^MJ, where /3 > 2 be given. Then asymptotic reUable recovery is not possible for 
{x(^^)} with respect to Error Metric 1 if 

N ^ (8) 

for some constant C2 > that depends only on /3, P and u. 

Proof: The proof is given in Section IIV-A.II □ 

Corollary 2.4: Let a sequence of sparse vectors, {x^^^^ G 'C^}m with ||x(^^)||o = L = [^MJ, where 
/? > 2 be given. Then for ^ > 0, for any Gaussian measurement matrix, A, and for Error Metric 1, 
P(perT(A|x(*'^)) —>■ 1) goes to 1 exponentially fast as a function of M if ^ ^'^i^' where C2 < C2 
is a positive constant that depends only on (3, P, v and ^. 

Proof: The proof is given in Section IIV-A. 1 1 □ 

Theorem 2.5: (Achievability for Error Metric 2) Let a sequence of sparse vectors, {x^*^) € C^^}m 
with ||x(^^)||o = L = L^MJ, where /5 > 2 be given such that L/i^(x(^^)) and P are constant. Then 
asymptotic reliable recovery is possible for {x^^^^ } with respect to Error Metric 2 if 

N^C^L (9) 

for some constant C3 > 1 that depends only on a, (3, ii{x^^'>) and u. 

Proof: The proof is given in Section IIII-C.2I □ 
Corollary 2.6: Let the conditions of Theorem 12.51 be satisfied. Then for any Gaussian measurement 
matrix. A, and for Error Metric 2, ¥{pe„{A\x.^'^'^^) > ^) is exponentially decaying to zero as a function 
of M for any ^ e (0,1]. 

Proof: As shown in the proof of Theorem 12.51 perr(^|x*^^"^'') decays exponentially fast in M. 
Applying Markov's Inequality, yields the desired result. □ 
Theorem 2.7: (Converse for Error Metric 2) Let a sequence of sparse vectors, {x^^^^ G 'C^'^}m with 
I I |o = -^^ = L;g-^J , where /3 > 2 be given such that P is constant. Then asymptotic reliable recovery 
is not possible for {x^^"'^)} with respect to Error Metric 2 if 

N -< C4L (10) 

for some constant C4 > that depends only on a, /?, P and u. 

Proof: The proof is given in Section IIV-A. 2[ □ 
Corollary 2.8: Let a sequence of sparse vectors, {x^*^) G C*^}Af with ||x(*^)||o = L = [^Mj , where 
/? > 2 be given such that P is constant. Then for ^ > 0, for any Gaussian measurement matrix. A, and 



February 2, 2008 



DRAFT 



DRAFT 



6 



for EiTor Metric 2, P(perr(A|x(^^)) 1) goes to 1 exponentially fast as a function of Af if -< C^L, 
where C4 < C4 is a non-negative constant that depends only on a, /3, P, v and ^. 

Proof: The proof is analogous to the proof of Corollary |2.4[ □ 
Theorem 2.9: (Achievability for Error Metric 3) Let a sequence of sparse vectors, {x^^^ E C^^}m 
with ||x(^^)||o = L = L^MJ, where /? > 2 be given such that P is constant. Then asymptotic reliable 
recovery is possible for {x^^^^} with respect to Error Metric 3 if 

NyC^L (11) 

for some constant C5 > 1 that depends only on /3, 7, P and u. 

Proof: The proof is given in Section IIII-C.3I □ 
Corollary 2.10: Let the conditions of Theorem 12.91 be satisfied. Then for any Gaussian measurement 
matrix, A, and for Error Metric 3, P(peiT(A|x(^^)) > ^) is exponentially decaying to zero as a function 
of M for any ^ G (0,1]. 

Proof: The proof is analogous to the proof of Corollary 12.61 □ 
Theorem 2.11: (Converse for Error Metric 3) Let a sequence of sparse vectors, {x^^^ G C*^}m with 
||x^^^^||o = L = L^MJ, where /? > 2 be given such that P is constant and the non-zero terms decay to 
zero at the same rate. Then asymptotic reliable recovery is not possible for {x^^^^} with respect to Error 
Metric 3 if 

N -< CqL (12) 

for some constant Cg > that depends only on /3, 7, P, ii{x^^^'>) and u. 

Proof: The proof is given in Section IIV-A.3I □ 
Corollary 2.12: Let a sequence of sparse vectors, {x^^) G C^}m with ||x(^)||o = L = [^M\, 
where /3 > 2 be given such that P is constant and the non-zero terms decay to zero at the same rate. 
Then for ^ > 0, for any Gaussian measurement matrix. A, and for Error Metric 3, P(perr(A|x(*^)) — > 1) 
goes to 1 exponentially fast as a function of M if ^ CqL, where Cq < Ce is a non-negative constant 
that depends only on /?, 7, P, /x(x(*^)), z/ and ^. 

Proof: The proof is analogous to the proof of Corollary 12.41 □ 



A. Discussion of The Results 

Theorem 12.11 implies that for Error Metric 1, 0(L) measurements are sufficient for asymptotic reliable 
sparse recovery. There is a clear gap between this number of measurements and 0{Llog{M — L)) 
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measurements required by £i constrained quadratic programming [11]. In this proof, it is required that 
^ ' — > oo as L — > oo, which implies that P grows without bound as a function of N. 

Theorems 12.51 and 12.91 show that for Error Metrics 2 and 3, the number of required measurements to 
achieve asymptotic reliable sparse recovery is = 0{L). In this case P remains constant, which is a 
much less stringent requirement than that of Theorem l2.1l Converses to these theorems are established in 
Theorems 12.31 1277] and 12. 1 1] which demonstrate that 0{L) measurements are asymptotically necessary. 

Finally we note that Corollaries 12.61 and 12.101 imply that with overwhelming probability (i.e. the 
probability goes to 1 exponentially fast as a function of M) a given N x M Gaussian measurement 
matrix A can be used for asymptotic reliable sparse recovery (respectively for Error Metrics 2 and 3) as 
long as = 0{L). Similarly Corollaries 12.81 and 12. 121 prove that a given Gaussian matrix A will have 
Perr(A|x(^^)) 1 (respectively for Error Metrics 2 and 3) with overwhelming probability as long as 
the number of measurements is less than specified constant multiples of L. Corollaries 12.21 and 12.41 are 
similar in nature. 

III. ACHIEVABILITY PROOFS 

A. Notation 

Let Sii denote the column of A. For the measurement matrix A, we define A j to be the matrix 
whose columns are {aj : j G J'}. For any given matrix B, we define ITb to be the orthogonal projection 
matrix onto the subspace spanned by the columns of B, i.e. IIb = B(B*B)^-'^B*. Similarly, we define 
Ilg to be the projection matrix onto the orthogonal complement of this subspace, i.e. Ilg = I — IIb. 

B. Joint Typicality 

In our analysis, we will use Gaussian measurement matrices and a suboptimal decoder based on joint 
typicality, as defined below: 

Definition 3.1: (Joint Typicality) We say an x 1 noisy observation vector, y = Ax + n and a set 
of indices J' C {1,2, ... , M}, with \ J'\ = L, are (5-jointly typical if rank(Aj') = L and 



— ni y P 



< S, (13) 



where n ~ A/'c(0, v'^In)^ the entry of A, aij ~ A/cCO, 1), and ||x||o = L. 

Lemma 3.2: For an index set T C {1, 2, . . . , M} with \Z\ = L, 

P(rank(Ai) < L) = 0. 

Lemma 3.3: 
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Let X = supp(x) and assume (without loss of generality) that rank(Ax) = L. Then for 6 > 0, 

' 5^ \ 



— ni y P 



> S] <2 exp 



(14) 



4i/4 AT - L + Hn / ■ 

• Let JT" be an index set such that \ J\ = L and \I Ci J] = K < L, where I = supp(x) and assume 
that rank(A j) = L. Then y and J are (5-jointly typical with probability 



1 

iV 



,9 N-L r, 

u 

' AT 



where 



< (5 I < exp 



S' = S- 



N-L 



Xk\ 



(15) 



N-L 



Proof: We first note that for 



we have 



y = Ax + n = ^ XiBi + n, 



ni,y = ni^n, 



and 

Furthermore = UjDUj, where Uj is a unitary matrix that is a function of {ai : i G X} (and 
independent of n). D is a diagonal matrix with N — L diagonal entries equal to 1, and the rest equal to 
0. It is easy to see that 

l|ni,y||' = ||Dn'|p, 

where n' has i.i.d. entries with distribution A/clO,!^^). Without loss of generality, we may assume the 
non-zero entries of D are on the first N — L diagonals, thus 



IDn 



/||2 



n'lp H h \n'j^-i 



Similarly, = Uj-DUj, where Uj- is a unitary matrix that is a function of {aj : j € J} (and 

independent of n and {aj : i G I\JY) and D is as discussed above. Thus a'j = ut^aj has i.i.d. entries 
with distribution A/c'(0, 1) for all i G T\J. It is easy to see that n" = Uj-n also has i.i.d. entries with 
A/'c(0, v'^). Thus 



|ni,y| 



IDwl 



klP H h \WN-L?', 
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where Wi are i.i.d. with distribution MciOjCTj), where 



2 I |2 I 2 

aj = \xk\ + V . 

k&I\J 

Let 0.1 = and VL2 = il^^U-. We note that both and ^2 are chi-square random variables with 

u <j-j 

{N — L) degrees of freedom. Thus to bound these probabiUties, we must bound the tail of a chi-square 
random variable. We have, 



— ni y P 



> 6 



Qi -{N -L) 



>4iv 



ni -{N-L)< --^N) + pmi - (iV - L) > -^N 



and 



1 I 1,2 N - L n 



< 6 



'J 



<4iv 



J 



<¥[Q2-{N -L)< -{N-L)[ 1- — ) + —N 



J 



J 



For a chi-square random variable, Q with {N — L) degrees of freedom [3], [7], 

¥[9. -{N -L)< -2^/{N - L)Xj < e'^, 

and 

F(n -{N- L)> 2y/{N -L)X + 2a) < 

By replacing = fii and 

6N 



-A 



2u^VN - L 



in Equation (1181) and 



A 



1 



26 



N - L + - VN - L] > 



^y"^ N -L + Hn 



in Equation ( [T9l ). we obtain using Equation ([T6l ) 

1 M„l mO N -L 



N 



|ni yip 1/^ 



> 6 ] < exp ^ 
< 2 exp 



6^ 
6^ 



+ exp 



Ar2 



(16) 



(17) 



(18) 



(19) 



N - L+ 4n 



Av^ N -L + / ■ 
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Similarly by replacing Q = and 



2 ] „2 



N 



L 



1 



N 



a 



J 



a^jN-L 



in Equation (fTSl) . we obtain using Equation ([TT] ) 

N- L 



-mi y|l' 



< 6 ] < exp 



exp 



iV-L/f7?^-l^2-5'\2 

N-LfEkeJ\j\^k\^-^^ 



I \ 2^ 



□ 



C. Proofs of Theorems For Different Error Metrics 
We define the event 

Ej = {y and J are (5-jointly typical } 

forallJc{l,...,M}, \ J\ = L. 
We also define the error event 

Eq = {rank(Ax) < L}, 

which results in an order reduction in the model, and implies that the decoder is looking through subspaces 
of incorrect dimension. By Lemma [3^ we have P(-E'o) = 0. 

Since the relationship between M and x^^^) is implicit in the following proofs, we will suppress the 
superscript and just write x for brevity. 

1) Proof of Theorem 12. 71 (Error Metric 1): Clearly the decoder fails if Eq or Ej occur or when one 
of Ej occurs for J j^l. Thus 

p,„{V\^)=F{EoUE§ U Ej) 

J,J^X,\J\=L 

We let N = (4Co + l)L where Co > 2 + log(/3 - 1) is a constant. Thus 6' = ^^^S = C'^d with 
C'q > 1. Also by the statement of Theorem 12. 1[ we have Lfi'^{-x.) grows faster than logL. We note that 
this requirement is milder than that of [10], where the growth requirement is on /^^(x) rather than /^^(x). 
Since the decoder needs to distinguish between even the smallest non-overlapping coordinates, we let 
6' = C/U^(x) for < C < 1- For computational convenience, we will only consider 2/3 < C < 1- 
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By Lemma [331 



F{E§) < 2 exp 



Zy2 + 2C/x2(x) J 

and by the condition on the growth of /u(x), the term in the exponent grows faster than logL. Thus 
F{Ej) goes to faster than exp(— logL). 

Again by Lemma [331 for J with \I n J\ = K, 

^ N -L fEkei\j\xk\^ -^'^^^ 



^(Ej) < exp 

1^ 12 \ 

■'kei\J 



4 VEfeejx.rkfcP + f^^ 



Since J2kei\ t l^fel^ — i-^ ~ ^)/^^(^)' we have 



where /u(x) is defined in Equation Q. 

The condition of Theorem 12. II on ^(x) impUes that F{Ej) — > for all K. We note that this condition 
also implies P ^ oo as grows without bound. This is due to the stringent requirements imposed by 
Error Metric 1 in high-dimensions. 

By a simple counting argument, the number of subsets J' that overlaps Z in K indices (and such that 
rank(A j) = L) is upper-bounded by 

\k) \L-K 

Thus 

C'Co V(x) \ 



Perr(^|x) < 2 exp 



1/2 l/2 + 2C;u2(x) 



L-1 



L \(M-L\ / N -L f iL-K)i?iy.)-5' 
' ' ' exp ' ' 



_ ( L/(x) , ^ (L\[M-L\ ( N - L f {K')f,H^) - 5' 

"""^y 1/2 ^2 + 2C/i2(x)J +^^U7 V K' y'^^'Pl^ 4 \{K')f,^^) + u^ 

We will now show that the summation goes to as M ^ oo. We use the following bound 

exp (i^'log (A)^ < < exp (K'log (^)) (21) 

to upper bound each term of summation, sk' by 

/Le\ /(M-L)e\ iV - L / KV(x) - (5' \ 2 

^ (^) + ^ (^^) -^( k>2(^,)\, ^ 

= exp ( L- log ^ + L- log ^°H Lf^2(,) + ,2 
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We upper bound the whole summation by maximizing the function 

f{z) = Lzlog- + Lzlog CoLi- — — : 

z z \Lzfj,''(x) + 



-2Lz log z + Lzi2 + log(/3 - 1)) - CoL{^^^0L_^ 



(22) 



for z G [j^, 1]. If /(z) attains its maximum at zq, we then have 

L 

^ Si^' < Lexp(/(zo)). 
ft''=i 

For clarity of presentation, we will now state two technical lemmas. 

Lemma 3.4: Let g{z) be a twice differentiable function on [a, b] that has a continuous second derivative. 
If g{a) < 0, g{b) < 0, and g'{a) < 0, g'{b) > 0, and g"{a) < 0, g"{b) < 0, then /(x) is equal to for 
at least two points in [a, b]. 

Proof: Since g'{a) < and g'{b) > 0, g'{x) has to be increasing in a subset E C [a,b]. Then 
> for some xq G E. Since (7"(a) < 0, g"{xo) > and (/"(x) is continuous, there exists 
xi e [a, xo] such that g"{xi) = 0. Similarly, since g"{b) < 0, there exists X2 G [xo,b] such that 
5"(X2) =0. □ 
Lemma 3.5: Let p{z) = a^z^ + a^z^ A- a2z'^ + aiz + be a polynomial over R such that 04, 03, oq > 0. 
Then p{z) can have at most two positive roots. 

Proof: Let r^p\r^p\r'f\r'^'^ be the roots of p{z), counting multiplicities. Since 

(1) (2) (3) (4) = > 

the number of positive roots must be even, and since 

not all the roots could be positive. The result follows. □ 
Lemma 3.6: For L sufficiently large, f{z) (see Equation (l22l )) is negative for all z ^[j^, 1]. Moreover 
the endpoints of the interval, z^^ = and Zq^^ = 1 are its local maxima. 

Proof: We first confirm that f{z) is negative at the endpoints of the interval. We use the notation 
^ for denoting the behavior of f{z) for large L, and -< and >- for inequialities that hold asymptotically. 

/ (i) = 2 log L + 2 + log(/? - 1) - CoL (^^1^1^) ' < (23) 

for sufficiently large L, since L^'^{x.) grows faster than logL. Also for large L, we have 

/(l) = L(2 + l„g(,-l))-C„L(^g^^)'^ 

3l(2 + log(/3 - 1) - C„) -: 0. (24) 
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We now examine the derivative of f{z), given by 

f'{z) = -2Llogz + Llog{(3 - 1) - 2CoL^f,\^){iy^ + C/i'(x)) 



Also, 

f(^) = 2LlogL + Llog(/? - 1) - 2C7oL2/(x)(^^2 + ^^2(^)) 1 " C 



^L(21ogL + log(/5 - 1) - 2(7op^^^) -< 
for sufficiently large L, since L/i'^(x) grows faster than logL. Similarly 

/'(I) = Llog(/3 - 1) - 2CoLV'(x)(z^' + C;"'(x))- ^ " ^ 



rLlog(/3 - 1) - 2Co7^(z^2 + C/^'(x)) y 



(L/x2(x) +zy2)3 

1 

M"(xj 

since ^^5^ grows slower than ^J 
Additionally, 

, 2L r2 4/ ^/ 2 , /- 2/ ^^/-2Lz/i2(x) _^3(^ 2(^^ 

r (.) = -- - 2CoL^/(x)(.^ + Ca.^(x)) ( (4^2(,)+,2)4 

-2L 



2(Lz//2(x) +1/2)4 



(LV(x) + + CoLV^(x)(z^' + C/i'(x))(-2LV(x) + + 3C/x2(x))z 

(25) 



Thus, 



and 



Since /(z) is twice differentiable function on [i, 1] with a continuous second derivative. Lemma \TA\ 
implies that f"{z) crosses at least twice in this interval. Next we examine the polynomial (see Equation 

mi 

p{z) = (LV(x) + v^f + 2C7oL2/(x)(z^' + C/i'(x))(-2Lz/i2(x) + + KlJ?{^))z. 

Since p{z) satisfies the conditions of Lemma [331 we conclude that it has at most two positive roots, and 
thus at most two roots of p{z) can lie in [■^, 1]. In other words f"{z) can cross for z G [j^, 1] at most 
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twice. Combining this with the previous information, we conclude that f"{z) crosses exactly twice in 
this interval, and that f'{z) crosses only once, and this point is a local minima of f{z). Thus the local 
maxima of f{z) are the endpoints Zq^^ = and z^^^ = 1. □ 
Thus we have, 

C^CO L/(X) V . r.f (1). f, (2)^ 



Pe„.(I?|x) < 2exp ( - \2\2/2cV(x) ) + ^ exp(max{/(4^^/(4^^)}) 



A'=0 

,4/ 



= 2 exp - ^ ) ^ ( ^ + ™ { Ki) ' ^^'^ 

From Equations (1231 ) and (l24b . it is clear that log(L) + max > ^ — oo as L ^ cxd. Hence 

with the conditions of Theorem 12. 1[ p^„{V\-s) ^ as L ^ cxd. 

2) Proof of Theorem 12.51 (Error Metric 2): For asymptotic reliable recovery with Error Metric 2, we 
require that F{Ej) goes to for only K < {1 — a)L with a G (0, 1). By a re-examination of Equation 
dlOl ). we observe that the right hand side of 

'<--^)--(-^(^^:; 

converges to asymptotically, even when L;U^(x) converges to a constant. In this case P does not have 
to grow with A^. We let > (and hence 5') be a constant, and let N = (4(73 + 1)-^ for 

Given the decay rate of /U^(x) and that 5' > is arbitrary, we note that this constant only depends on 
a, /3, /x(x) and v. Hence 

^ ^^^^ ( L \/M-L\ ( N-L/{L-K)fi^{x)-5'Y\ 

<2expf-il ■ + ' at) 

- '^l 4i.*4C3 + f (4Cs + l) / 



L / 



+ 2^ exp\LH{ — ]+{M- L)H[ — j ) - C^L 

K'=aL \ 



where -ff(a) = — alog(a) — (1 — a) log(l — a) is the entropy function for a G [0, 1]. Since K' is greater 
than a linear factor of L and since P is a constant, and using Equation (l26l ). we see perr(^|x) ^ 
exponentially fast as L ^ cxd. 
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3) Proof of Theorem 12. 91 ( Error Metric 3): An eiTor occurs for Error Metric 3 if 

kei\J 

Thus we can bound the error event for J' from Lemma 13.31 as 



Let y > be a fraction of 7P. We denote the number of index sets J' C {0,1, ... , M} with \ J'\ = L 
as and note that < (*^) . Thus, 

For N > C^L, a similar argument to that of Section ITlI-C.21 proves that perr(^|x) — > exponentially fast 
as L — > 00, where C5 depends only on /3, 7, P and z^. 

IV. Proofs of Converses 
Throughout this section, we will write x for x^*^) whenever there is no ambiguity. 

A. Genie-Aided Decoding and Connection with Noisy Communication Systems 

Let the support of x be T = {ii,i2, ■ ■ ■ with ii < 12 < • ■ ■ < ii- We assume a genie provides 
xj = (xj J ,Xi^, . . . ,Xi^ y to the decoder defined in Section |lll 
Clearly we have 

Perr ^ Ferr 

1) Proof of Theorem \2.3\ ( Error Metric 1): We derive a lower bound on the probability of genie-aided 
decoding error for any decoder. Consider a Multiple Input Single Output (MISO) transmission model 
given by an encoder, a decoder and a channel. The channel is specified by H = [xi^Xi^ . . . Xij^] = Xj. 
The encoder, £1 : {0, 1}^^ C^^^, maps one of the (^^) possible binary vectors of (Hamming) weight 
L to a codeword in C^^^. This codeword is then transmitted over the MISO channel in A*" channel uses. 
The decoder is a mapping Di : {0, 1}^^ such that its output c has weight L. 

Let c G {0,1}^^ and supp(c) = J = {ji, J2, • • • , ji} with ji < 32 < ■■■ < Jl- Let zf = 
(afcjj,afcj2) • • • j'^^fcji,)^' where am,n is the (m,n)* term of A. The codebook is specified by 



J C{l,2,...,M},\J\=L 
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and has size (Y). The output of the channel, y is 

?/fc = Hzf + nfc for A; = 1, 2, . . . , A^, 

where and are the A;* coordinates of y and n respectively. The average signal power is E(| |z^ | p) = 
L, and the noise variance is En| = u'^. The capacity of this channel in N channel uses (without channel 
knowledge at the transmitter) is given by [9] 

Cmiso = N\og ^1 + - ^I^^Y HHtj = iVlog 1^1 + ^ 

After N channel uses, p^j.^.'^^ > if log (*^) > Cmiso- Using 

mTT i"" (s) ) s ( L ) s ("^ (f) ) • 



we obtain the equivalent condition 



log(l + ^) \P 



^ < P^ MH{ - ] - o{M), 



where L = (3M, and H{-) is the entropy function. 

To prove Corollary I2.4[ we first show that with high probability, all codewords of a Gaussian codebook 
satisfy a power constraint. Combining this with the strong converse of the channel coding theorem will 
complete the proof [6]. If A is chosen from a Gaussian distribution, then by Inequality ( [T9] l. 



for any J C {1, 2, . . . , M}, U| = L and for A: = 1, 2, . . . , A^. Let A = 2Jf5H{^) +C + 2{pH{^) +^) 



for ^ > 0. By the union bound over all (Y) possible index sets J' and A; = 1, 2, . . . , A^, 

i||zf||2< (1 + A), yj, k = l,...,N^ > l-iVexp(-eL). 

If the power constraint is satisfied, then the strong converse of the channel coding theorem implies that 
Perr (A|x) gocs to 1 exponentially fast in M if 

1 

log ( 1 + 



P(1+A)A 



7 
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2) Proof of Theorem \2.7\ (Error Metric 2): For any given x with ||x||o = L, we will prove the 
contrapositive. Let Pe^ ^'^ denote the probability of error with respect to Error Metric 2 for x G C^^. We 
show that N >- C^L if P^^'^^ 0. 

Consider a single input single output system, S, whose input is c € {0, 1}^, and whose output is 
c G {0, l}*''^, such that ||c||o = ||c||o = L, and ||c — c||o < 2aL. The last condition states that the support 
of c and that of c overlap in more than (1 — a)L locations, i.e. Pe2^^ = 0. We are interested in the rates 
at which one can communicate reliably over S. 

In our case d{c,c) = jjYlk=i^H{ci,Ci), where c is i.i.d. distributed among (^) binary vectors of 
length M and weight L, and dni', •) is the Hamming distance. Thus D < = We also note that 
S can be viewed as consisting of an encoder (Bi, a MISO channel and a decoder, 5)i as described in 
Section ITV- A. 1[ Since the source is transmitted within distortion ^ over the MISO channel, we have [2] 

In order to bound we first state a technical lemma. 

Lemma 4.1: Let a G (0, 1] and /? > 2, and let 

c{z) = H{z) + {l3-l)H' ^ 



= -2z log(z) -(\-z) log(l - z) + (/? - 1) log(/3 - 1) - (/? - 1 - z) log(/3 - 1 - z), 

where is the entropy function. Then for z G [0, a], c{^z) > 0, and c{z) attains its maximum at 
z = mill (a, ^j^). 

Proof: By definition of H{-), c{z) > for z G [0,0;]. By examining 

c'iz) = -2 log(z) + log(l -z)+ log(/? - 1 - z) = log (^^^^^Mfl^^ , 



it is easy to see that c'{z) > for z G ^0, min (a, 



and c'(z) < otherwise. □ 
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Thus we have 



/(c,c) 
> log 



c||o=||c||o=L,||c-c||q<2qL 
aL 

E 



M 



log 



H{c) -H{c\t) 



M- L 
K 



> MH 



-log(M + l)-log|^f] 



||c||o=||c||o=L,||c-c||o<2aL 



exp(LiJ(^)+(M-L)ff(^^ 



> < 



MH{ i 



log(M + 1) - log(aL + 1) - L H{a) + {[3- l)H 



if 



where the first inequaUty follows since given c, c is among X]jS=o {k) {^k^) possible binary vectors 
within Hamming distance 2aL from c. The second inequality follows from Inequality (l27l ). and the third 
inequality follows by Lemma |4~T] 



Thus Ri"^) > LC. 



c. 



a,f5 



o{L), where 



H{a) - i(3-l)H 



/3-1 



ifa<^ 
if a > ^ 



(28) 



Therefore if Pf 



(M) 



0, then 



LC^^p-o{L) <iVlog 1 + 



or equivalently for large M, 



N y 



a 



a, 13 



L. 



log [^ + ^ 

The contrapositive statement proves Theorem 12.71 

3) Proof of Theorem \2.11\ (Error Metric 3): For Error Metric 3, we assume that p(x) = maxjgi|xj| 
and //(x) = mirijgj \xi\ both decay at rate 0(^^J^). Thus P is constant. In the absence of this assumption, 
some terms of x can be asymptotically dominated by noise. Such terms are unimportant for recovery 
purposes, and therefore could be replaced by zeros (in the definition of x) with no significant harm. 

Let a(7,x) = min ( lJ^^x) ' ^e^^^ denote the probability of error with respect to Error Metric 

3 for X € C^^. If Pe^^^ = and if an index set J' is recovered, then J2kei\j\^k\'^ — where 
T = supp(x). This implies that < 0(7, x)L. Thus P^ 



(A/) 



implies that Pif'^^ 



when 



recovering 0(7, x) fraction of the support of x. As shown in Section |IV-A.2[ reliable recovery of x is 
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not possible if 

N ^ 

log (1 + 

where Cq,(^^x),/3 is a constant (as defined in Equation (l28l) ) that only depends on 7, //(x) and P for a 
given X. 

V. SUBLiNEAR Regime 

For completeness, we also state the equivalent theorems, when L = o{M). The proofs follow the same 
steps as those in the linear regime. For the proofs of converse results, we use the bounds from Equation 
(I2T]) instead of those of Equation (|27] ). 

Theorem 5.1: (Achievability for Error Metric 1) Let a sequence of sparse vectors, {x^^^^ G C*^}m 
with ||x(^^^||o = L = o{M) be given. Then asymptotic reliable recovery is possible for {x^^^^} with 
respect to Error Metric 1 if L/i^(x(*^)) ^ 00 as L ^ 00 and 

N y C[ L\og{M - L) (29) 

for some constant > that depends only on fi{x.^^^) and z^. 

Proof: The proof is similar to that of Theorem 12. 1[ with f{z) replaced by 

^ /M-L\ iV-L/LV(x) -C/^2(x)\2 

m= -2L. log . + + log (_ j - — ( ' ) . 

The behavior of k[z), k'{z) and k"{z) at the endpoints {-^,1}, is the same as that in the proof of Theorem 
ITT] whenever N = C[L \og{M - L). The result follows. □ 
Theorem 5.2: (Converse for Error Metric 1) Let a sequence of spai^se vectors, {x^^^^ G C^^}m with 
||x^*^^||o = L = o{M) be given. Then asymptotic reUable recovery is not possible for {x^^"^)} with 
respect to Error Metric 1 if 

Llog^M^ (30) 

log F 

for some constant C2 > that depends only on P and u. 

Proof: The proof is similar to that of Theorem 12.31 □ 
Theorem 5.3: (Achievability for Error Metric 2) Let a sequence of sparse vectors, {x^*^) G C^^}m 
with ||x(*^)||o = L = o{M) be given such that L;U^(x(*^)) and P are constant. Then asymptotic reliable 
recovery is possible for {x^*^)} with respect to Error Metric 2 if 

N ^ C'3 Llog{M - L) (31) 
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for some constant C3 > that depends only on a, /i(x(^^)) and v. 

Proof: The proof is similar to that of Theorem 12.51 □ 
Theorem 5.4: (Converse for Error Metric 2) Let a sequence of sparse vectors, {x^*^) G C^^}a/ with 
||x^^^^||o = L = o{M) be given such that P is constant. Then asymptotic reliable recovery is not possible 
for {x^^^) } with respect to Error Metric 2 if 

N ^C'^Llog{M - L) (32) 

for some constant C4 > that depends only on a, P and u. 
Proof: We have the following technical lemma, 
Lemma 5.5: Let a G (0, 1] and L = o{M), and let 

, /M - L 
d{z) =2z — 2zlog(2;) + 2; log 



L 

Then for z G [0, a], and for sufficiently large M, d{z) attains its maximum dX z = a. 
Proof: By examining 

d'{z) = -21og(z) +log {—^] = log 



L J Lz'^ 

it is easy to see that >- for sufficiently large M. □ 
Continuation of the proof of the theorem: Thus we have. 



/(c,c) =H{c)-H{c\c) 

" -||o=||c||o=L,||c-c||o<2aL 



c| |o=| |c| |o=L,| |c— c| |n<2aL 



>_ Uo, (f ) - >„. ( E »P (^-'o. (f ) + A-lo. {^))) 

> Llog(M) - aLlog(M - L) - o(L log M) > (1 - Q)Llog(M - L) - o(L log M), 



where the first inequality follows from Inequality ((2TI) . and the second inequality follows by Lemma 
for sufficiently large M. The rest of the proof is analogous to that of Theorem 12.71 □ 



Theorem 5.6: (Achievability for Error Metric 3) Let a sequence of sparse vectors, {x^*^) G C^^}m 
with ||x(^^)||o = L = o{M) be given such that P is constant. Then asymptotic reliable recovery is 
possible for {x*^*^) } with respect to Error Metric 3 if 

N ^ C'^ Llog{M - L) (33) 

for some constant Cg > that depends only on 7, P and u. 

Proof: The proof is similar to that of Theorem 12.91 □ 
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Theorem 5.7: (Converse for Error Metric 3) Let a sequence of sparse vectors, {x^^^^ G C*^}Af with 
||x^^^^||o = L = o{M) be given such that P is constant and the non-zero terms decay to zero at the 
same rate. Then asymptotic reliable recovery is not possible for {x^^^)} with respect to Error Metric 3 if 

N ^ Cq L\og{M - L) (34) 

for some constant Cg > that depends only on 7, P, ^{x^^^^) and u. 

Proof: As in the proof of Theorem 12.1 1[ we let a(7,x) = min(-j^^,l), and conclude that 
Pe^^ = implies that Pe^^ = when recovering 0(7, x) fraction of the support of x. The rest of the 
proof is analogous to that of Theorem 15.41 □ 
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